Skip to content

Top NLP papers

  1. Attention is All You Need

  2. LSTM: Long Short-Term Memory

    Offers:

    • a solution to the vanishing gradient problem in RNNs
    • Solve the problem of long-term dependencies in RNNs
  3. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

    Offers:

    • A new pre-training approach for NLP tasks. (Then fine-tune the model on specific tasks such as question answering, sentiment analysis, etc.)
    • Achieved state-of-the-art results on 11 NLP tasks
    • BERT is a transformer-based model
    • BERT is trained on two tasks: Masked Language Model (MLM) and Next Sentence Prediction (NSP)
    • BERT is trained on a large corpus of text data (BooksCorpus and Wikipedia)
  4. Word2Vec

    Offers:

    • A method to learn word embeddings
    • Word2Vec is a shallow neural network model
    • Word2Vec has two models: Continuous Bag of Words (CBOW) and Skip-gram
    • Word2Vec is trained on a large corpus of text data
  5. GLoVe: Global Vectors for Word Representation

    Offers:

    • A method to learn word embeddings. Similar to Word2Vec
    • GLoVe is a matrix factorization technique
    • Woed2Vec uses local context information to learn word embeddings, while GLoVe uses global context information
    • Word2Vec uses a shallow neural network model, while GLoVe uses a matrix factorization technique.(GloVe is a count-based model. Linear regression model is used to predict the word vectors)
  6. RNN Encoder-Deocder: Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation

    Offers:

    • A model architecture for seq2seq tasks
    • Encoder-Decoder is used in machine translation, text summarization, and other seq2seq tasks
    • Encoder-Decoder is trained to map an input sequence to an output sequence
  7. Attention Mechanism

    Offers:

    • A mechanism to improve the performance of seq2seq models
    • Attention mechanism is used in machine translation, text summarization, and other seq2seq tasks
    • Attention mechanism allows the model to focus on different parts of the input sequence when generating the output sequence
    • Attention mechanism is used in conjunction with RNNs and transformers
  8. Seq2Seq: Sequence to Sequence Learning with Neural Networks

    Offers:

    • A model architecture for seq2seq tasks
    • Seq2Seq is used in machine translation, text summarization, and other seq2seq tasks
    • Instead of RNN Encoder-Decoder, Seq2Seq uses LSTM Encoder-Decoder
    • Other differences between Seq2Seq and RNN Encoder-Decoder include the use of word embeddings and beam search
    • Instead of mapping the sentence a, b, c to the sentence α, β, γ the LSTM is asked to map c, b, a to α, β, γ, where α, β, γ is the translation of a, b, c.
    • That makes it easy for SGD to “establish communication” between the input and the output
  9. BLEU: A Method for Automatic Evaluation of Machine Translation

    Offers:

    • A metric to evaluate the performance of machine translation systems
    • BLEU is used to compare the output of a machine translation system with a reference translation
    • BLEU is based on the precision of n-grams in the output translation
    • BLEU is widely used in machine translation research and development
  10. GPT-3: Language Models are Few-Shot Learners